A Three-Way Decision Approach to Email Spam Filtering

نویسندگان

  • Bing Zhou
  • Yiyu Yao
  • Jigang Luo
چکیده

Many classification techniques used for identifying spam emails, treat spam filtering as a binary classification problem. That is, the incoming email is either spam or non-spam. This treatment is more for mathematical simplicity other than reflecting the true state of nature. In this paper, we introduce a three-way decision approach to spam filtering based on Bayesian decision theory, which provides a more sensible feedback to users for precautionary handling their incoming emails, thereby reduces the chances of misclassification. The main advantage of our approach is that it allows the possibility of rejection, i.e., of refusing to make a decision. The undecided cases must be re-examined by collecting additional information. A loss function is defined to state how costly each action is, a pair of threshold values on the posterior odds ratio is systematically calculated based on the loss function, and the final decision is to select the action for which the overall cost is minimum. Our experimental results show that the new approach reduces the error rate of classifying a legitimate email to spam, and provides better spam precision and weighted accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Three-Way Decisions Solution to Filter Spam Email: An Empirical Study

A three-way decisions solution based on Bayesian decision theory for filtering spam emails is examined in this paper. Compared to existed filtering systems, the spam filtering is no longer viewed as a binary classification problem. Each incoming email is accepted as a legitimate or rejected as a spam or undecided as a further-exam email by considering the misclassification cost. The three-way d...

متن کامل

A Novel Hybrid Approach for Email Spam Detection based on Scatter Search Algorithm and K-Nearest Neighbors

Because cyberspace and Internet predominate in the life of users, in addition to business opportunities and time reductions, threats like information theft, penetration into systems, etc. are included in the field of hardware and software. Security is the top priority to prevent a cyber-attack that users should initially be detecting the type of attacks because virtual environments are not moni...

متن کامل

Layout Based Spam Filtering

Due to the constant increase in the volume of information available to applications in fields varying from medical diagnosis to web search engines, accurate support of similarity becomes an important task. This is also the case of spam filtering techniques where the similarities between the known and incoming messages are the fundaments of making the spam/not spam decision. We present a novel a...

متن کامل

Layout Based Spam Filtering Claudiu

to the constant increase in the volume of information available to applications in fields varying from medical diagnosis to web search engines, accurate support of similarity becomes an important task. This is also the case of spam filtering techniques where the similarities between the known and incoming messages are the fundaments of making the spam/not spam decision. We present a novel appro...

متن کامل

Classifying Unsolicited Bulk Email (UBE) using Python Machine Learning Techniques

Email has become one of the fastest and most economical forms of communication. However, the increase of email users has resulted in the dramatic increase of spam emails during the past few years. As spammers always try to find a way to evade existing filters, new filters need to be developed to catch spam. Generally, the main tool for email filtering is based on text classification. A classifi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010